Supporting Imprecision in Database Systems

نویسنده

Ullas Nambiar

چکیده

A query against incomplete or imprecise data in a database1, or a query whose search conditions are imprecise can both result in answers that do not satisfy the query completely. Such queries can be broadly termed as imprecise queries. Today’s database systems are designed largely for precise queries against a database of precise and complete data. Range queries (e.g., Age BETWEEN 20 AND 30) and disjunctive queries (e.g., Name=“G. W. Bush” OR Name=“George Bush”) do allow for some imprecision in queries. However, these extensions to precise queries are unable to completely capture the expressiveness of an imprecise query. Supporting imprecise queries (e.g., Model like “Camry” and Price around “$15000”) over databases necessitates a system that integrates a similarity search paradigm over structured and semi-structured data. Today’s relational database systems, as they are designed to support precise queries against precise data, use such precise access support mechanisms as indexing, hashing, and sorting. Such mechanisms are used for fast selective searches of records within a table and for joining two tables based on precise matching of values in join fields in the tables. The imprecise nature of the search conditions in queries will make such access mechanisms largely useless. Thus, supporting imprecise queries over existing databases would require adding support for imprecision within the query engine and meta-data management schemes like indexes. Extending a database to support imprecise queries would involve changing the query processing and data storage models being used by the database. But, the fact that databases are generally used by other applications and therefore must retain their behaviour could become a key inhibitor to any technique that relies on modifying the database to enable support for imprecision. For example, changing an airline reservation database will necessitate changes to other connected systems including travel agency databases, partner airline databases etc. Even if the database is modifiable, we would still require a domain expert and/or end user to provide the necessary distance metrics and domain ontology. Domain ontologies do not exist for all possible domains and the ones that are available are far from being complete. Therefore, a feasible solution for answering imprecise queries should neither assume the ability to modify the properties of the database nor require users (both lay and expert) to provide much domain specific information.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Supporting Imprecision in Multidimensional Databases Using Granularities

On-Line Analytical Processing (OLAP) technologies are being used widely, but the lack of effective means of handling data imprecision, which occurs when exact values are not known precisely or are entirely missing, represents a major obstacle in applying these technologies in many domains. This paper develops techniques for handling imprecision that aim to maximally reuse existing OLAP modeling...

متن کامل

Probabilistic Databases: Diamonds in the Dirt∗ (Extended Version)

A wide range of applications have recently emerged that need to manage large, imprecise data sets. The reasons for imprecision in data are as diverse as the applications themselves: in sensor and RFID data, imprecision is due to measurement errors [28, 66]; in information extraction, imprecision comes from the inherent ambiguity in natural-language text [32,40]; and in business intelligence, im...

متن کامل

A Probabilistic NF2 Relational Algebra for Imprecision in Databases

We present a probabilistic data model which is based on relations in non-rst-normal-form (NF2). Here, tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. This way, imprecise attribute values are modelled as a probabilistic subrelation. For information retrieval, the set of weighted index terms of a document can be represented in the same way, thu...

متن کامل

Accessing Imprecise Data: An Approach Based on Intervals

In many real world applications (even in banking), imprecise data is a matter of fact. However, classic database management systems provide little if any help in the management of imprecise data. We are applying methods from interval arithmetic, epsilon serializability, and other related areas to help the application designers in the management of naturally imprecise data. Our approach includes...

متن کامل